Reducing the energy cost of walking with low assistance levels through optimized hip flexion assistance from a soft exosuit

As we age, humans see natural decreases in muscle force and power which leads to a slower, less efficient gait. Improving mobility for both healthy individuals and those with muscle impairments/weakness has been a goal for exoskeleton designers for decades. In this work, we discover that significant reductions in the energy cost required for walking can be achieved with almost 50% less mechanical power compared to the state of the art. This was achieved by leveraging human-in-the-loop optimization to understand the importance of individualized assistance for hip flexion, a relatively unexplored joint motion. Specifically, we show that a tethered hip flexion exosuit can reduce the metabolic rate of walking by up to 15.2 ± 2.6%, compared to locomotion with assistance turned off (equivalent to 14.8% reduction compared to not wearing the exosuit). This large metabolic reduction was achieved with surprisingly low assistance magnitudes (average of 89 N, ~ 24% of normal hip flexion torque). Furthermore, the ratio of metabolic reduction to the positive exosuit power delivered was 1.8 times higher than ratios previously found for hip extension and ankle plantarflexion. These findings motivated the design of a lightweight (2.31 kg) and portable hip flexion assisting exosuit, that demonstrated a 7.2 ± 2.9% metabolic reduction compared to walking without the exosuit. The high ratio of metabolic reduction to exosuit power measured in this study supports previous simulation findings and provides compelling evidence that hip flexion may be an efficient joint motion to target when considering how to create practical and lightweight wearable robots to support improved mobility.


• Soft Exosuit o Textiles
We designed the soft exosuit to assist hip flexion, which is slightly modified from a portable hip extension exosuit 1 . The functional apparel components of the exosuit consist of a spandex base layer (200 g), a waist belt (250 g), and two thigh wraps (2 × 146 g) (Fig. 1d). The base layer incorporates high-friction materials (NuStim, FabriFoam, USA) to help the exosuit remain attached to the wearer. We constructed the waist belt and thigh wraps with layers of an inextensible, abrasion-resistant plain-weave textile and a lightweight custom sailcloth material to evenly distribute the pressure around the iliac crests. The thigh wraps can be adjusted with laces and a tensioning dial (L4, Boa Technology, USA), and the waist belt can be tightened using a Velcro fastener. Detailed schematics of the waist belt, thigh wrap, and base layer construction can be found in the supplementary material of Kim et al. 1 . The total added mass of the system relative to condition in the main study was 0.69 kg of which 0.25 kg was concentrated on the waist (the mass of the waist belt) and 0.22 kg on each thigh (including the thigh brace, IMU, and load cells). The spandex base layer was worn in all conditions. o Off-board actuator and sensors We used a non-portable and tethered two-degrees-of-freedom (DoF) actuation system to provide the hip flexion assistance moment through Bowden cables. Anchoring points of the Bowden cable are at the bottom left and right of the waist belt and at the top center of the thigh wrap from an anterior view. On the actuator side, we attached the inner Bowden cable and its outer sheath to the pulley and pulley cover frame, respectively. On the exosuit side, we connected the inner Bowden cable and its outer sheath to the anchoring point of thigh wrap and waist belt using metal buckles (COBRA FM, AustriAlpin, Canada), respectively. When the actuator retracts, the distance between the two anchoring points on the waist belt and thigh wrap is shortened and tension is generated in the cable to assist hip flexion. Two load cells (2 × 43g) (LSB200, FUTEK, USA) are integrated into the anchoring point of thigh wrap to measure the cable force, and two IMUs (2 × 33g) (MTi-3 AHRS, Xsens, Netherlands) are secured by the plastic clip on the anterior part of thigh wrap to measure the thigh segment angle. More details of the actuator can be found in 2 . We used this off-board actuator for the main study and the repeating optimization experiment in the supplemental study.
o Controls The electronics hardware has a three-layer configuration for the control system architecture: real-time target machine (top layer; Speedgoat, Switzerland), microprocessor (middle layer; ATSAME70N21, Atmel Co, USA), and a servo motor driver (bottom layer; Gold Twitter, Elmo Motion Control, Israel), and a separate, dedicated computer is used for the HIL optimization algorithm. This optimization computer collects the participant's metabolic data in real-time using an indirect calorimetry device (K5, Cosmed, Italy), runs the CMA-ES optimization algorithm in MATLAB (MathWorks, USA), and sends the assistance profile parameters that will be explored in the next condition to the real-time target machine. More details of the control system architecture can be found in 3 .
The high-level controller in a real-time target machine uses an IMU-based iterative algorithm that detects maximum hip extension (MHE) in every stride by identifying sign change in thigh angular velocity and estimates the wearer's gait cycle based on three most recent stride times. This gait cycle is further divided into assisted and non-assisted phases. During the assisted phase, force control is used to create a hip flexion force profile with desired onset, peak, offset timings, and peak force magnitude. During the unassisted phase, position control is used to push out the cable in order not to disturb the wearer's limb motion and updates the cable position on a step-by-step basis to reach the desired pretension force just before switching to force control. While the high-level control is switching between force and position controllers, a low-level proportionalintegral (PI) controller in Simulink (MathWorks, USA) with a servo motor driver in a torque control mode is used continuously. It is important to note that for two of the shorter participants, the short distance between the two anchoring points on the thigh wrap and waist belt limited the force tracking capabilities during the maximum force condition ( -). During the assistance force was delivered with either lower peak force magnitude or earlier offset timing for 49.8% and 47.8% of the steps for participants 1 and 8, respectively. , and thigh wraps used in the main study were used as part of the portable system. The IMU-based iterative algorithm, the force and position switching controller, and the low-level PI controller were all implemented in standard C language (Visual Studio Code, Microsoft, USA). A waist belt (0.22 kg) similar in design to that used in the main study was used, with added attachments and adjustment for the miniature winch and electronics. The total weight of the system was 2.31 kg of which 1.87 kg was concentrated on the waist and 0.22 kg on each thigh. We used this portable system for the autonomous and portable exosuit experiment in the supplemental study.
• Human-in-the-loop (HIL) Optimization o Covariance matrix adaptation evolutionary strategy (CMA-ES) We used a Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) to identify three control parameters ( , , and ) ( Fig. 1a) of the exosuit force profile that minimized the metabolic cost of walking for each participant 4,5 . The main objective of the covariance matrix adaptation is to fit the search distribution to the contour lines of an objective function to be minimized so that it can quickly approach the global optimum while minimizing the number of function evaluations. An overview of the algorithm is as follows.
CMA-ES first evaluates a group of candidate parameter sets that form the population of one generation. Candidate parameter sets are sampled from a multivariate normal distribution, characterized by a mean, a covariance matrix, and a standard deviation. The mean of the distribution represents the current estimate of the optimal parameter values, and the population size ( ) is determined based on the number of parameters ( ) being optimized (e.g., = 7 for = 3). After evaluating the current generation, the parameter sets are ranked in terms of performance and a rank-weighted average of the best sets (e.g., top three performers for = 3) becomes the mean of the next generation. The locations of best parameter sets are used to update the covariance matrix of the next generation, while additional multi-generational correlation refines the covariance matrix and standard deviation. This process is repeated from generation to generation to imitate natural selection. Based on our simulation result, the eight generations were selected as a stopping criterion and a simple repair method 6 (i.e., if the sampled parameter set is outside of the boundary, this is moved to the closest feasible point) was chosen to handle boundary constraints. See Supplementary Results for more details on our simulation results. The mean of the final calculated generation serves as the best estimate of the optimal parameter. More details can be found in 7 .
o Metabolic rate estimation The objective function of the CMA-ES algorithm is the steady-state metabolic rate of walking with hip flexion assistance. The steady-state metabolic rate corresponding to each exosuit control parameter set was estimated by fitting a first-order dynamics model 8,9 to two minutes of breath-by-breath transient-state metabolic measurements. We chose a sample duration of two minutes by considering the tradeoff between the estimation accuracy 7 of the first-order dynamics model and the total number of function evaluations given the limited time budget.
• Protocol Details o Day 1 (Training day) On the training day, we ran a CMA-ES optimization algorithm for four generations (about 1 hour) to expose the participant to a variety of different force profiles in the parameter space. Additionally, to allow the participant to experience the biological torque inspired profile ( ) which is outside of parameter space, the participant walked with profile at four different force magnitudes ( -, -, -ℎ ℎ, -) for four minutes each.
o Day 2 (Optimization day) On optimization day (Supplementary Video S1), we first measured the resting metabolic rate while participants stood still for 4 minutes. Then, the participants completed a 5minute adaptation period of walking with a suit active at the starting condition of optimization. Participants then completed eight generations of HIL optimization (about 2 hours) with 5 minutes of rest and 5 minutes of adaptation in the middle. Each generation consisted of the objective function evaluation at seven different points in the parameter space, so there were 56 function evaluations in total (8 generations × 7 evaluation points/generation). The function evaluation was through 2 minutes of the metabolic rate estimation process. When transitioning from one profile to the other (i.e., between function evaluations), there were five transitioning strides where we gradually changed the assistance profile step by step to prevent a disturbance from an instantaneous change.
After the optimization process was completed, we measured the metabolic cost of walking with individualized optimal assistance ( -), fixed assistance ( -ℎ ℎ ), and walking with assistance turned off ( ). The conditions were conducted at the beginning and repeated a second time at the end and the average was taken to avoid an order effect. Two assist-on conditions were randomized. Each condition lasted 5 minutes, and participants were allowed at least a 2-minute break between conditions.
o Day 3 (Biomechanics day) On biomechanics day (Supplementary Video S2), we measured the resting metabolic rate first. Then, the participants completed a five-minute warm-up of walking without a suit and an eight-minute warm-up with assistance. The eight minutes of active warm-up consisted of four different conditions, each two minutes in length. The four conditions were -, one profile from timing, and two profiles from timing. The force levels for and timings are randomly chosen. Participants then completed a total of 11 conditions including two , one , and eight different assist-on conditions. The two conditions were conducted at the beginning and end of the protocol and the average was taken to avoid an order effect. All eight of the assist-on conditions and the one condition were randomized in the order. Each condition lasted 5 minutes, and participants were allowed at least a 2-minute break between conditions. We calculated the metabolic rate based on O2 and CO2 data gathered from the last two minutes of each condition using the Brockway equation 10 .
The eight assist-on conditions included three different assistance timings ( , , ) and five different assistance magnitudes denoted as subscript ( -, -, - , -ℎ ℎ , and -). The eight conditions tested were: The three different timings were defined as follows. was the individualized force profile timing resulting from HIL Optimization where the onset timing was fixed at MHE (0%MHE) and the peak timing and offset timing were individualized.
o Supplemental study (n = 1) -Repeating optimization To evaluate if the individualized optimal profile remains constant over time, we repeated optimization day on one male participant (Participant 3; 29 years; 183cm; 93kg) 1 month after his first Optimization Day visit. We measured the resting metabolic rate, and then a participant completed eight generations of HIL optimization. After the optimization process was completed, we measured the metabolic cost of the condition and two assist-on conditions: the old optimal assistance ( -) identified on Day 2 and the new optimal assistance ( -) found on re-optimization day. The conditions were conducted at the beginning and repeated a second time at the end, and the metabolic rate was calculated as the average of the two. The old and new optimal assistance conditions were conducted in a double-reversal order. Each condition was tested twice, first in random order, and then again in reverse order to mitigate the effects of adaptation or drift in respiratory measurements (order: oldnewnewold). The metabolic rate was calculated as the average of both double-reversal presentations.

• Physiological and Biomechanical Measurements in the Main Study
o Metabolic measurements We used an indirect calorimetry device (K5, Cosmed, Italy) to measure O2 consumption and CO2 production. Participants were asked to refrain from alcohol, caffeine, and strenuous activities for twelve hours before the study and fast for two hours before the study. We calculated the metabolic rate based on O2 and CO2 data gathered from the last two minutes of each condition using the Brockway equation 10 . The net metabolic rate for each condition was obtained by subtracting the metabolic rate obtained during a standing trial performed at the beginning, from the metabolic power calculated during the walking conditions. All net metabolic rates and reductions were normalized to the participant's body mass.
o Kinematics and kinetic measurements We measured body segment motions using a motion capture system (Qualisys, Sweden) and reflective markers placed according to a Helen-Hayes-type marker set. We measured ground reaction forces using an instrumented treadmill (Bertec, OH, USA) at 2 kHz. We filtered motion capture data and ground reaction force data using a zero-lag, fourth-order, low-pass Butterworth filter with a cutoff frequency at 7 Hz. We calculated joint kinematics and total (i.e., human plus the exosuit) joint moments and powers of the left leg in a sagittal plane using inverse kinematics and dynamics analyses in Visual3D (C-Motion, MD, USA). We normalized the total joint moments and powers by each participant's body mass. We calculated the exosuit joint moment by multiplying the forces from the exosuit load cell with a moment arm of the force path of the Bowden cable relative to the hip joint calculated using motion capture. We calculated the exosuit joint power by multiplying the exosuit moment by the hip joint angular velocity. We calculated the portion of the hip moment and power that is delivered by the biological muscles and passive tissues by subtracting the contribution of the exosuit from the total joint moment and power calculated by inverse dynamics. We used an automatic gait event detection algorithm (Visual3D, C-Motion, MD, USA) to determine heel strikes that defined gait cycles. Ten strides per condition were used for generating mean kinematic and kinetic data for each participant. The Qualisys data file which contained Motion Capture, GRF, and EMG data from the condition of one participant (Participant 3) was corrupted and thus was excluded from this analysis. Metabolic data which was logged separately and uncorrupted was still included.
o Apparent joint efficiency To compute the apparent joint efficiency, we divided the average exosuit positive mechanical power (bilateral sum) by changes in net metabolic power as suggested in 16 .
The average exosuit positive mechanical power was calculated by integrating only the positive portions of the left exosuit mechanical power curves (from heel-strike to heelstrike), multiplying by 2 (with the assumption of symmetry between left and right legs), and dividing the total by the average stride period. The change in net metabolic power was calculated by subtracting the net metabolic power during the optimal assistance condition ( -) from the net metabolic power during condition.
o Electromyography measurements On Biomechanics Day, we measured activation of the left and right gluteus maximus, biceps femoris, rectus femoris, vastus lateralis, erector spinae and the left gastrocnemius medialis, soleus, and tibialis anterior muscles using surface electromyography (sEMG) at 2 kHz (Delsys, MA, USA). We processed sEMG data using a 20-to 450-Hz bandpass filter, rectification, and a 6-Hz low-pass filter. For each participant, we discarded parts of sEMG data based on visual inspection for artifacts due to sensor motion, sensor placement, sensor baseline drift, and cable actuation 17 . By default, we used sEMG data of the left leg for analysis, and we normalized sEMG based on the average of the two conditions. When artifacts were identified, we used sEMG data of the right leg (in 14.1% of all data), or we normalized sEMG data using only a single condition (in 15.5% of all data). Finally, if artifacts were identified in both legs we discarded the data for that condition (in 15.9% of data).

• The Simulation of HIL Optimization
o Motivation One of the most important things in the HIL optimization study in the exoskeleton field is the experimental validation of the optimization algorithm (i.e., whether the optimization algorithm does truly find the optimum or not). Of the all HIL study on various lower-limb joints with different algorithms 3,7,8,[18][19][20] , only three studies experimentally validated their algorithms, and all were only one-dimensional parameter spaces 8,19,20 . This is mainly because the number of grid points that need to be sampled for experimental validation exponentially increases, as the dimension of parameter space increases. Furthermore, given the high noise level in the steady-state metabolic data, each point needs to be sampled multiple times to minimize the variability. Considering that a function evaluation (i.e., measuring the steady-state metabolic rate) at one point takes 5-6 minutes, it is extremely time-consuming and very difficult to experimentally prove that the HIL algorithm successfully found the global optimum, especially for higher-dimension optimization problems. Thus, we utilized simulation to confirm that this optimization procedure could find a global optimum under realistic metabolic noise conditions and to determine the best hyperparameter setting for the algorithm.
o Parameter space Based on pilot experiments with hip flexion assistance, we found that both timing and magnitude of assistance cause a considerable change in the metabolic cost of assisted walking. Ideally, optimizing three timing parameters (onset, peak, and offset timings) and a peak force magnitude parameter is desirable. However, with further pilot testing, we found that the earlier onset timing is better, but onset timing earlier than MHE is detrimental to metabolic rate. Given that we had a limited time budget for fourdimensional optimization, we aimed to conduct three-dimensional optimization, where the three parameters were peak timing ( ), offset timing ( ), and the magnitude of peak force ( ) of the assistance profile (Fig. 1a). The onset timing was set to be 0%MHE. It is worthwhile to note that the biological torque inspired timing ( ) was outside of parameter space but was included for the comparison for evaluation on Biomechanics Day.
We set a few constraints on these parameters based on hardware limitations and an underlying biomechanical rationale. The magnitude of peak force was set to be higher than 60 N and lower than 150 N. Forces lower than this range had a negligible impact on metabolic rate during pilot testing and force above this range had a clear negative impact on metabolic rate. The offset timing was set to be no later than 45%MHE in order not to interrupt lower limb motion in an early stance phase. Finally, the duration between the onset and peak timings and the duration between the peak and offset timings were required to be at least 10% of the gait cycle due to the rising and falling time of the force controller.
o Objective function The objective function of the optimization algorithm in our study is the metabolic rate of walking with hip flexion assistance, where the shape of the assistance profile is determined by three parameters ( , , ) to be optimized. To mimic the actual objective function in a simulation, we included additive zero-mean Gaussian noise to the noise-free ground-truth objective function. A ratio between the range of objective function and the amount of noise (i.e., signal variation to noise ratio) was set based on the literature and experimental pilot testing data. More details about this ratio can be found in the next section, signal variation to noise ratio (SNR). In this simulation, we assigned a ground-truth objective function as follows.
First, we assumed only one global optimum with no local optimum. The objective function with multiple local optima is a possible option, but the analysis will be complicated depending on the number of local optima, the relative locations of local optima with respect to global optimum, and the relative values of the objective function at the local optima with respect to that of the global optimum. Furthermore, there is no strong evidence that multiple local optima exist in the metabolic landscape in the literature. To draw a clear and simple conclusion, we assumed there was a single global optimum.
Second, we assumed that the objective function value increases linearly as the distance from the global optimum increases. Here, each parameter dimension is normalized to have a value between 0 and 1, and the Euclidean distance in the three-dimensional parameter space is used so that the distance is not dominated by one parameter due to their different units. This linear objective function is not an oversimplified assumption, because when projecting the linear objective function to lower-dimensional parameter space, the metabolic landscape will appear as a bowl-shape which is prevalent in many literatures.
Third, we did not fix the location of the global optimum in the parameter space across multiple simulation runs. Instead, we uniformly sampled 1000 points in the parameter space and used these points for each simulation run. The location of the global optimum point does affect the performance of the optimization algorithm. For example, the CMA-ES algorithm is initially designed for an unconstrained optimization problem, and thus it is vulnerable to the case where the ground-truth optimal point is located close to the boundary of the parameter space. Therefore, we systematically varied the location of the optimum point and reported the performance of the algorithm along with this optimum location change.
o Signal variation to Noise Ratio (SNR) The ratio of the signal variation across the parameter space (i.e., the range of the objective function) relative to the noise in the measured signal has a considerable impact on the difficulty of the optimization problem (e.g., when SNR is small, finding the global optimum is extremely difficult). To estimate SNR, we identified the signal variation and the noise, respectively as follows.
To find the signal variation in the parameter space, we applied a variety of different assistance profiles to the participants and measured the net metabolic rate of walking during the pilot tests. The signal variation over the parameter space was found to be 1.223 W kg -1 .
Additionally, there are two different sources of metabolic noise: (i) the estimation error caused by instantaneous cost mapping and (ii) the inherent repeatability error in the metabolic data. The estimation error is caused by instantaneous cost mapping, the fitting process used to quickly estimate the steady-state metabolic cost for each iteration of the optimization process. At each iteration, we fit two minutes of transient data to the firstorder dynamics of the human respiratory system with a known time constant 9 to estimate the steady-state metabolic rate for that condition. The estimation error would decrease if a longer duration of transient data is included in the fitting. However, with a fixed budget of total optimization time, there is a trade-off between minimizing estimation error and maximizing the number of iterations. When using the two minutes of transient data, the mean of estimation error was identified as 4.3% of the net metabolic rate 7 . We modeled this estimation error ( ) as a zero-mean Gaussian noise, and the standard deviation ( ) of Gaussian noise can be identified as follows.
Even if we measure the steady-state metabolic data following standard practices (i.e., the participant performs the same activity for five minutes and the last two minutes of steadystate data is used to calculate the metabolic rate), there is an inherent repeatability error. This error can be easily seen by repeating the same condition multiple times and comparing the steady-state metabolic rate. The mean repeatability error varied from 2.6% to 4.8% of the net metabolic rate when analyzing the experimental data available in 21 . Similar to the estimation error, we modeled this repeatability error ( ) as a zero-mean Gaussian noise, and the standard deviation ( ) of Gaussian noise can be identified as follows.
There is quite large uncertainty in the repeatability error, so after doing the main threeday protocol, we validated the error range with experimental data in this study. The repeatability error was around = 3.9% of the net metabolic rate in this study.
Finally, by assuming estimation and repeatability errors are independent, we can model the total error ( ) as a zero-mean Gaussian noise with the following standard deviation.
= √ 2 + 2 = 6.7% of the net metabolic rate (9) where means the noise-free ground-truth value of the net metabolic rate and means the noisy measurement of the net metabolic rate.
When translating the standard deviation of the total error to signal variation ( ), it was estimated as approximately 18.2%.
o Performance metric To quantify the accuracy of the optimization algorithm, we used the normalized Euclidean distance between the ground-truth optimum and the point found by the algorithm. Again, we used normalized distance to avoid a biased result because of different parameter units.
During pilot testing, it was difficult for the participant to differentiate the timing difference smaller than 3% or the force magnitude difference smaller than 15 N. Furthermore, we didn't see clear metabolic differences between two profiles whose timings and/or peak force differences are within 3% or 15 N. Based on these, we concluded that a good target for the optimization was to on average be within 3% of both timings and 15 N of the peak force, which is equivalent to a normalized distance of 0.238.
o Hyperparameters and simulation settings There are two important hyperparameters in CMA-ES to be determined through the simulation: (i) the number of generations (i.e., the total number of iterations) and (ii) the penalization method. Regarding the number of generations, if we run an optimization algorithm for only a small number of generations, the algorithm will not converge and fail to find the true optimum. On the other hand, if we run an algorithm with too many generations, we will increase the required walking time without improving the algorithm accuracy. Thus, we aimed to identify the appropriate number of generations by varying this hyperparameter between 1 and 10 in the simulation.
The CMA-ES algorithm was originally devised for the unconstrained optimization problem but to use this algorithm in a constrained optimization problem, we need to penalize the objective function value when we sample the point outside of the parameter space. We explored three different penalization methods via simulation. The first method is a simple repair method 6 that moves an infeasible point to the closest feasible point in the parameter space (penalization method 1). The second method is resampling 6 any infeasible point until it becomes feasible (penalization method 2). The third method is evaluating the objective function on a repaired search point (i.e., closest feasible point) and adding a penalty depending on the distance to the repaired solution (penalization method 3). Here, a two-norm distance with an adaptive coefficient was used 22 .
In addition to the two aforementioned hyperparameters, there are two key simulation settings: (i) the SNR and (ii) the location of the optimal point. Although we estimated the amount of noise as accurately as possible based on the literature and experimental data, there is still uncertainty in the SNR that is hard to be identified beforehand. Thus, we varied the SNR from 0% to 35% over 5% increments and evaluated how the algorithm accuracy changed as the noise level increased. Finally, since we don't have any information where the optimum point will be located in the parameter space, we uniformly sampled 1000 points from the parameter space and assigned each point as ground truth for one simulation run.
After running simulations for every possible combination of the aforementioned different hyperparameters and simulation settings, we calculated the mean and the standard error of the mean of the performance metric for 1000 different optimum locations to see the general trend with respect to the number of generations, the penalization method, and the SNR. In the simulations, we assigned a different seed number to the random number generator for different locations of ground truth optimum but kept the seed number the same when changing the other three simulation settings (the number of generations, the penalization method, and the SNR).

• The Simulation of HIL Optimization o Covariance matrix adaptation evolutionary strategy (CMA-ES)
The simulation result is shown in Supplementary Fig. S1. For all subfigures in Supplementary Fig. S1, the y-axis is the normalized distance between the ground-truth optimum and the optimal point found by the optimization algorithm. The smaller distance means improved algorithm accuracy. As we increase the number of generations, the algorithm accuracy is monotonically improved. However, this trend levels off after around the eighth generation ( Supplementary Fig. S1a). Furthermore, among three different penalization methods tested, the simple repair method (moving an infeasible point to the closest feasible point) was the most accurate no matter how many generations were run. Here, the noise level was set to 20% of signal variation which is the closest option tested to the best estimate of the noise. Based on this result, we set hyperparameters of the CMA-ES algorithm as 8 generations and a simple repair method. Given the high uncertainty in the noise level, we plotted the performance metric over noise ( Supplementary Fig. S1b) with these hyperparameter settings. When the noise level was 18.2% of signal variation which is the best estimate given the data available, the mean of normalized distance was 0.195, which equates to a 2.8% mean error in peak and offset timings and 10.1 N mean error in the peak force magnitude, which was better than our target of 0.238 mentioned above.
As expected, the algorithm accuracy deteriorates as the noise level increases. Interestingly, this relationship (normalized distance between the ground truth and the algorithm solution versus standard deviation of zero-mean Gaussian noise) was close to linear ( Supplementary Fig. S1b). When the ground-truth optimum is located closer to the boundary of the parameter space, the algorithm accuracy gets worse (Supplementary Fig.  S1c and S1d). This trend is somewhat expected because if the optimum is closer to the boundary, there are higher chances that the infeasible points are sampled during the optimization process.
o Bayesian Optimization (BO) We repeated the exact same simulation setting with Bayesian optimization. After tuning hyperparameters (e.g., kernel function, kernel parameters, etc.) in BO, we compared the performance between BO and CMA-ES. Bayesian optimization was highly accurate and sample efficient, but it was less robust to the noise. Its performance quickly degraded when noise increased (e.g., Bayesian optimization was prone to suggest an incorrect optimal point located at the boundary.), and when the noise level reached 18.2% of signal variation which is the best estimate given the data available, Bayesian optimization performed worse than CMA-ES.

• The Experiment of HIL Optimization
During human participant testing, we used the CMA-ES algorithm over eight generations with a simple repair method as boundary conditions to estimate optimal assistance profiles for each participant. Based on our preliminary simulation, the CMA-ES algorithm with these specific settings most accurately identified the optimal profile given the low signal-to-noise ratio (SNR) and the limited time budget for three-dimensional parameter space in this problem (Supplementary Results and Fig. S1).
For each participant, the CMA-ES algorithm found the optimal assistance profile with a small scale of the search distribution at the last generation ( Supplementary Fig. S2a). The objective function (i.e., metabolic rate) showed a decreasing trend as the algorithm iterated over generations and eventually leveled off at the end ( Supplementary Fig. S2b), confirming that the optimization was run successfully with converged parameters (Supplementary Fig. S2c).

• Supplemental Study (n = 1) -Repeating Optimization
As a supplemental study, we repeated the optimization on one participant as a preliminary investigation into how much the optimal settings changed over time and if the updated optimization settings resulted in a noticeably lower metabolic cost than the old settings. Thus, a month after the main study, we re-ran the HIL optimization on one participant to identify new optimal settings and then immediately experimentally measured the participant walking in those new settings ( -), the old optimal settings ( -) found during the main study above, and walking with no assistance ( ). We repeated each of the three conditions twice to account for the noise in the metabolic measurement. The results are shown in Supplementary Fig.  S3. The old optimization settings were 20.6%MHE peak, 39.9%MHE offset, and 127 N; and resulted in a 6.9% (0.25 W kg -1 ) metabolic reduction relative to . The new optimization settings were 12.4%MHE peak, 33.2%MHE offset, and 143 N; and resulted in an 8.6% (0.31 W kg -1 ) metabolic reduction relative to . Thus, the optimal assistance settings did change (more than the minimum timing and peak force differences that can differentiate two profiles), and the new optimal profile performed metabolically better than the old optimal profile, although with only N=1, the metabolic difference between the two was not big enough to account for the repeatability error in the metabolic cost (Supplementary Methods).

• Physiological and Biomechanical Measurements in the Main Study o Kinematics and Dynamics
To isolate the passive effects of wearing the suits from the active biomechanical effect of assistance, we compared and conditions using paired t tests. It is worth noting that there were a few small but significant differences between the two, summarized in Supplementary Table S6. o Electromyography measurements There were no significant changes in any of the sEMG signals between and . For -, the integral of vastus lateralis during stance increased (0.059 ± 0.015; P < 0.01) relative to . There were no significant changes between and -. For -, the integral of vastus lateralis during stance increased (0.064 ± 0.017; P < 0.01) relative to but was not significantly different from -. Finally increases in peak exosuit torque across the force sweep conditions ( -ℎ ℎ , and -) resulted in a linear increase in the integral of tibialis anterior (R 2 = 0.88; P < 0.001), biceps femoris (R 2 = 0.96; P < 0.01), and erector spinae (R 2 = 0.92; P < 0.01) using a linear random intercept mixedeffects model.

• The Experiment of HIL Optimization
In this study, we were not able to experimentally prove that the HIL algorithm successfully found the global optimum. Given the three-dimensional parameter space and low signal-to-noise ratio of metabolic measurements, it would be very difficult to experimentally validate but we do have confidence it converged to the most accurately identified global optimum based on our preliminary simulation and CMA-ES algorithm experiment results (Supplementary Results).

• Supplemental Study (n = 1) -Repeating Optimization
The optimal parameters utilized for the Optimization Day were a result of the HIL optimization run on that day but on the Biomechanics Day no optimization was run. Instead, we utilized the previously identified optimal parameters, assuming that those parameters still remained optimal across days. If these optimal parameters were to change between days, as found in the repeating optimization supplemental study (n = 1), that could also explain the smaller metabolic reduction found on the Biomechanics Day. However, there were no significant changes between Optimization Day and Biomechanics Day in the net metabolic rate of optimal assistance (P = 0.41), the net metabolic rate of (P = 0.87), or the relative difference between optimal assistance and (P = 0.35).
The repeated optimization results indicate that the optimal profile may vary over time and thus optimizing an individual's profile periodically may be required. However, this is solely based on a single-participant test and further testing and validation of statistical significance are needed. If the optimal profile does vary over time, measuring indirect calorimetry is too time-intensive to implement in a real-world environment, and thus a different approach is needed if we want to optimize exoskeletons in daily life. Finding a proxy objective function that highly correlates with metabolic energy and that can be measured easily in real-time will be the first step of the new paradigm of optimization. Ingraham et al. 23 has initiated this work by systematically evaluating how multiple physiological signals (e.g., limb accelerations, lower limb EMG, heart rate, skin temperature), both alone and in combination, can predict the metabolic cost of various activities.
They have used such physiological signals in tandem with metabolic measurements to increase the speed of predicting instantaneous metabolic cost 24 but have yet to develop a proxy function that does not use metabolic measurements. Further work is needed to identify an appropriate proxy function.

• Physiological and Biomechanical Measurements in the Main Study o Increasing average biological hip extension torque
During portions of the swing phase, exosuit torque and power exceeded that of net total hip torque and power. This resulted in increased average biological hip extension torque and average negative hip biological power in and -, and they were correlated with increases in peak exosuit torque during the force sweep conditions. The average biological hip extension torque increased between and by 0.032 ± 0.011 Nm kg -1 (P < 0.001) and between and by 0.029 ± 0.009 Nm kg -1 (P < 0.001). The average negative hip biological power increased in magnitude between and by -0.013 ± 0.011 W kg -1 (P < 0.05) and between and by -0.018 ± 0.012 W kg -1 (P < 0.01).
Due to the more net biological torque inspired timing there is lower average biological hip extension torque (-0.014 ± 0.012 Nm kg -1 ; P < 0.05) and average negative hip biological power (0.026 ± 0.013 W kg -1 ; P < 0.001) in condition compared to -. Although the average biological hip extension torque did still increase in magnitude relative to (0.018 ± 0.014 Nm kg -1 ; P < 0.05).
The increases in biological torque/power are not necessarily detrimental, they may just be revealing the natural underlying co-contraction of hip flexion and extension muscles during swing 25 . These findings align with the simulation results of Dembia et al. which found that when minimizing muscle actuation with hip flexion assistance, the device torque exceeded the net joint moment 26 . However, it is possible that during certain periods of the gait cycle the exosuit torques during the higher assistance conditions (120 and 150 N) exceeded those of the underlying musculature and negatively influenced metabolic cost.
o Electromyography measurements It is interesting that we don't see any significant reductions in EMG despite the fact that we see decreases in metabolic cost. Due to the constraints of surface EMG, the rectus femoris was the only muscle we measured which contributes to hip flexion. We were not able to measure anything from the four other muscles which contribute to hip flexion (iliacus, psoas, iliocapsularis, and sartorius). No significant changes in the peak or integral of the rectus femoris signal were found across conditions. The increases in tibialis anterior, biceps femoris, and erector spinae during force sweep are small but significant. Figure S1.    Tables   Table S1. Participant Anthropometrics and Optimal Parameters. The age, mass, height, individualized control parameters identified by HIL optimization on Optimization Day, and optimal peak force identified by one dimensional (1D) validation on Biomechanics Day for each of the eight participants. The optimal force magnitude from 1D validation was chosen from the best net metabolic rate across the force sweep conditions (

Figures
-ℎ ℎ , and -) in Supplementary Table S3. * Two conditions ( -and -ℎ ℎ ) were tied for one participant (P6), and the average between the two was chosen as the optimal force magnitude. SD is the standard deviation.     Max Hip Ext. Angle degrees -9.8 ± 0.9 -9.4 ± 1.0 -9.1 ± 0.9 -8.8 ± 0.9 -6.0 ± 1.0 #  Table S8. Apparent joint efficiencyhip extension. Studies were included if they met the following criteria: (i) used an active, bilateral exoskeleton to independently assist hip extension; (ii) demonstrated a metabolic reduction relative to powered off; and (iii) provided a bilateral sum of average positive exoskeleton power. Apparent joint efficiency was calculated as a ratio of the average bilateral positive exoskeleton power to the metabolic reduction. SEM is the standard error of the mean.  Table S9. Apparent joint efficiencyankle plantarflexion. Studies were included if they met the following criteria: (i) used an active, bilateral exoskeleton to independently assist ankle plantarflexion; (ii) demonstrated a metabolic reduction relative to powered off; and (iii) provided a bilateral sum of average positive exoskeleton power. Apparent joint efficiency was calculated as a ratio of the average bilateral positive exoskeleton power to the metabolic reduction. SEM is the standard error of the mean.